{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "058b95d9",
   "metadata": {},
   "source": [
    "# Run CleanVision on Torchvision dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a9248629",
   "metadata": {},
   "source": [
    "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/cleanlab/cleanvision/blob/main/docs/source/tutorials/torchvision_dataset.ipynb) "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0066cfe8",
   "metadata": {
    "nbsphinx": "hidden",
    "tags": []
   },
   "outputs": [],
   "source": [
    "!pip install -U pip\n",
    "!pip install \"cleanvision[pytorch] @ git+https://github.com/cleanlab/cleanvision.git\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a7aaca99",
   "metadata": {
    "tags": []
   },
   "source": [
    "**After you install these packages, you may need to restart your notebook runtime before running the rest of this notebook.**"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b605e21f",
   "metadata": {},
   "outputs": [],
   "source": [
    "from torchvision.datasets import CIFAR10\n",
    "from torch.utils.data import ConcatDataset\n",
    "from cleanvision import Imagelab"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6b777a46",
   "metadata": {},
   "source": [
    "### 1. Download dataset and concatenate all splits\n",
    "\n",
    "Since we're interested in generally understanding what issues plague our data, we merge the training and test sets into one larger dataset before running CleanVision. You could alternatively just run the package on these two sets of data separately to obtain two different reports.\n",
    "\n",
    "[CIFAR10](https://www.cs.toronto.edu/~kriz/cifar.html) is classification dataset, but CleanVision can be used to audit images from any type of dataset (including supervised or unsupervised learning).\n",
    "\n",
    "Load all splits of the CIFAR10 dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3d207006",
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "%%capture\n",
    "train_set = CIFAR10(root=\"./\", download=True)\n",
    "test_set = CIFAR10(root=\"./\", train=False, download=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f25425d",
   "metadata": {
    "tags": []
   },
   "source": [
    "Concatenate train and test splits"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c099c758",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset = ConcatDataset([train_set, test_set])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d8dd1a5b",
   "metadata": {
    "tags": []
   },
   "source": [
    "A sample from the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "245fe0d4",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset[0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "004f6288",
   "metadata": {},
   "source": [
    "Let's look at the first image in this dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c7e75294",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset[0][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f8581a6d",
   "metadata": {},
   "source": [
    "### 2. Run CleanVision"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "09c6440d",
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab = Imagelab(torchvision_dataset=dataset)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "02a34a0a",
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.find_issues()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19b16779",
   "metadata": {
    "tags": []
   },
   "source": [
    "### 3. View Results\n",
    "\n",
    "Get a report of all the issues found"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4e120feb",
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.report()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "782642ad",
   "metadata": {},
   "source": [
    "View more information about each image, such as what types of issues it exhibits and its quality score with respect to each type of issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "d23cc881",
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.issues.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fcbdba22",
   "metadata": {},
   "source": [
    "Get indices of all **dark** images in the dataset sorted by their dark score."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9b1df575",
   "metadata": {},
   "outputs": [],
   "source": [
    "indices = (\n",
    "    imagelab.issues.query(\"is_dark_issue\").sort_values(by=\"dark_score\").index.tolist()\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "68d50f6c",
   "metadata": {},
   "source": [
    "View the 5th darkest image in the dataset"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fcfc3389",
   "metadata": {},
   "outputs": [],
   "source": [
    "dataset[indices[5]][0]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8430529",
   "metadata": {},
   "source": [
    "View global information about each issue, such as how many images in the dataset suffer from this issue."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "38c37628",
   "metadata": {},
   "outputs": [],
   "source": [
    "imagelab.issue_summary"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "75912aea",
   "metadata": {},
   "source": [
    "**For more detailed guide on how to use CleanVision, check the** [tutorial notebook](tutorial.ipynb)."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.0"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
